Dont zero initialize in export_llama#16886
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16886
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit a3b33f1 with merge base 5690d26 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@JacobSzwejbka has exported this pull request. If you are a Meta employee, you can view the originating Diff in D91518961. |
This PR needs a
|
Summary: Zero initialization is non standard with pytorch models, and in particular with ET is frustrating because ET looks to greedily deduplicate weights. That means if you zero initialize a transformer model the pte size will be a lot smaller then you would expect if you didnt know about the deduplication. Differential Revision: D91518961
907d12e to
a3b33f1
Compare
Summary: Zero initialization is non standard with pytorch models, and in particular with ET is frustrating because ET looks to greedily deduplicate weights. That means if you zero initialize a transformer model the pte size will be a lot smaller then you would expect if you didnt know about the deduplication.
Differential Revision: D91518961